feat: Update OpenAI graph runner to return AgentGraphRunnerResult with GraphMetrics#155
Draft
jsonbailey wants to merge 20 commits intojb/aic-2174/graph-tracking-refactorfrom
Draft
feat: Update OpenAI graph runner to return AgentGraphRunnerResult with GraphMetrics#155jsonbailey wants to merge 20 commits intojb/aic-2174/graph-tracking-refactorfrom
jsonbailey wants to merge 20 commits intojb/aic-2174/graph-tracking-refactorfrom
Conversation
Draft
3 tasks
fcbcb18 to
9286d53
Compare
9733a28 to
44501e3
Compare
9286d53 to
72fc13e
Compare
44501e3 to
142e041
Compare
72fc13e to
bde4f09
Compare
142e041 to
fb3c0f6
Compare
bde4f09 to
c376011
Compare
fb3c0f6 to
b3547b0
Compare
c376011 to
7f67e4f
Compare
b3547b0 to
1d4ddb2
Compare
7f67e4f to
a89c6a2
Compare
1d4ddb2 to
6201d09
Compare
…nvoke() to run() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
a89c6a2 to
c69a9ff
Compare
6201d09 to
ef4216c
Compare
The new track_tool_calls method at line 413 (with summary storage and dedup guard) was being shadowed by the older method at line 559 (which only fired per-tool events). Merge them into a single method that both stores to the summary and fires per-tool events.
Previously, metrics_extractor(result) was called twice — once in the public track_metrics_of/track_metrics_of_async to read duration_ms, and again inside _track_from_metrics_extractor to track success, tokens, and tool calls. Extract metrics once in the public method and pass the resulting metrics + elapsed_ms into the private helper, which now also handles the duration tracking.
ManagedModel and ManagedAgent now require a Runner. The compat shims (_invoke_runner, isinstance(result, RunnerResult) branches, Union type annotations) are removed; result handling is direct on RunnerResult fields. The deprecated ManagedModel.invoke() is preserved for backwards compat but now delegates to run() and adapts the ManagedResult into the legacy ModelResponse shape. ModelRunner and AgentRunner protocol definitions remain in place so downstream provider packages that import them continue to work.
- Drop the inconsistent 'if metrics else None' guard on reported_ms; the next line already dereferences metrics.success unconditionally. - Use 'is not None' for tool_calls so an explicit empty list still triggers tracking (preserves the distinction between 'not tracked' and 'tracked with no calls').
Drop the deprecated invoke() method from the managed layer along with its dedicated test class and the warnings/LDAIMetrics/ModelResponse imports that were only needed by it. Type definitions in providers/ remain so downstream provider packages keep building.
…unner] The factory's downstream consumers (ManagedModel, ManagedAgent) now take Runner; aligning the factory's return types lets us drop the type: ignore comments at the ManagedModel/ManagedAgent call sites. Provider package PRs will update their concrete implementations to match. Judge still takes ModelRunner, so its call site picks up the type: ignore[arg-type] in its place — that's resolved later in the cleanup PR when Judge migrates to Runner.
Move the metrics_extractor call inside _track_from_metrics_extractor so extraction errors are caught and logged without bubbling up. When extraction fails or returns None, only the wall-clock duration is tracked — success/error is left untouched since the underlying model call itself succeeded. Also tighten the tool_calls check to access metrics.tool_calls directly, mirroring how metrics.usage is accessed.
- Judge now accepts Runner instead of ModelRunner - evaluate() calls runner.run(output_type=...) instead of invoke_structured_model - response.parsed replaces StructuredResponse.data; None guard added - evaluate_messages() accepts RunnerResult instead of ModelResponse - Tests updated to use RunnerResult and mock_runner.run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c69a9ff to
14cfa92
Compare
ef4216c to
8ecce16
Compare
14cfa92 to
1ed1a44
Compare
8ecce16 to
09af502
Compare
…ics]], remove defensive getattr
…nnerResult - OpenAIModelRunner.run() implements the unified Runner protocol; returns RunnerResult with content, metrics (LDAIMetrics), raw, and parsed fields. Structured output is supported via the output_type parameter. - OpenAIAgentRunner.run() updated to return RunnerResult; populates tool_calls in LDAIMetrics from observed openai-agents ToolCallItems. - Legacy invoke_model() and invoke_structured_model() retained as deprecated adapters that delegate to run() and wrap results into ModelResponse / StructuredResponse for backward compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nner Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… RunnerResult - LangChainModelRunner.run() implements the unified Runner protocol; returns RunnerResult with content, metrics (LDAIMetrics), raw, and parsed fields. Structured output is supported via the output_type parameter. - LangChainAgentRunner.run() updated to return RunnerResult; populates tool_calls in LDAIMetrics from observed tool_calls in message responses. - Legacy invoke_model() and invoke_structured_model() retained as deprecated adapters that delegate to run() and wrap results into ModelResponse / StructuredResponse for backward compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rit Runner - LangChainModelRunner: replaces invoke_model/invoke_structured_model with run(input, output_type=None); returns RunnerResult - LangChainAgentRunner: replaces AgentResult with RunnerResult; run() signature gains optional output_type parameter - Tests updated to call run() and assert result.content / result.parsed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rResult types - Add GraphMetrics dataclass (runner-layer return type for graph runs) - Add GraphMetricSummary dataclass (managed-layer metrics, analogous to LDAIMetricSummary for single-model invocations) - Add ManagedGraphResult dataclass (managed-layer return type from ManagedAgentGraph) - Add AgentGraphRunnerResult dataclass (future runner return type, no evaluations field) - ManagedAgentGraph.run() now returns ManagedGraphResult with GraphMetricSummary built from the runner's AgentGraphResult metrics - Export all new types from ldai package Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… new runner shape ManagedAgentGraph.run() now detects the runner result type and dispatches accordingly: - AgentGraphRunnerResult (new shape): managed layer drives all graph-level tracking from result.metrics (path, duration, success/failure, total tokens) via the graph tracker. Node-level tracking from node_metrics will be wired once runners populate that field (PR 11-openai/langchain). - AgentGraphResult (legacy shape): tracking already occurred inside the runner; managed layer wraps result without additional tracking. ManagedAgentGraph now accepts an optional graph parameter (AgentGraphDefinition) used to create the graph tracker. LDAIClient.create_agent_graph() passes the resolved graph definition. This is a deliberate bridge pattern: the legacy detection branch will be removed once both runners are migrated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1ed1a44 to
f016b0d
Compare
…h GraphMetrics Remove all direct LaunchDarkly tracker calls from OpenAIAgentGraphRunner. The runner now collects per-node metrics via _NodeMetricsAccumulator (a lightweight accumulator replacing the per-node LDAIConfigTracker) and returns AgentGraphRunnerResult with populated GraphMetrics (path, duration_ms, usage, node_metrics). Graph-level and per-node tracking events are emitted by ManagedAgentGraph._flush_graph_tracking() from the result. ManagedAgentGraph._flush_graph_tracking() is extended to also drive per-node tracking from result.metrics.node_metrics using the graph definition's node tracker factories. Integration tests in test_tracking_openai_agents.py are updated to run through the full ManagedAgentGraph pipeline (ManagedAgentGraph.run()) so tracking events are emitted by the managed layer as intended. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
09af502 to
43bc879
Compare
76b9580 to
7f0642e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OpenAIAgentGraphRunner_NodeMetricsAccumulator— a lightweight per-node metrics collector replacingLDAIConfigTrackerinside the runnerAgentGraphRunnerResultwith populatedGraphMetrics(path,duration_ms,usage,node_metrics)ManagedAgentGraph._flush_graph_tracking()from the result metricsManagedAgentGraph._flush_graph_tracking()extended to drive per-node tracking fromresult.metrics.node_metricsusing graph node tracker factoriesManagedAgentGraph.run()pipeline (tracking events now come from the managed layer)track_handoff_success()calls removed (per spec:pathfield is sufficient; handoffs are not inGraphMetrics)Depends on
Test plan
uv run pytest packages/ai-providers/server-ai-openai/tests/)test_openai_agent_graph_runner.py: runner returns new shape, no tracker createdtest_tracking_openai_agents.py: graph-level and per-node events emitted through managed layer🤖 Generated with Claude Code